22 research outputs found

    XED : A Multilingual Dataset for Sentiment Analysis and Emotion Detection

    Get PDF
    We introduce XED, a multilingual fine-grained human-annotated emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 43 additional languages, providing new resources to many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral. The dataset is carefully evaluated using language-specific BERT to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.Peer reviewe

    LT@Helsinki at SemEval-2020 Task 12 : Multilingual or language-specific BERT?

    Get PDF
    This paper presents the different models submitted by the LT@Helsinki team for the SemEval2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID dataset. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.Peer reviewe

    Pre-trained biomedical language models for clinical NLP in Spanish

    Get PDF
    This work presents the first large-scale biomedical Spanish language models trained from scratch, using large biomedical corpora consisting of a total of 1.1B tokens and an EHR corpus of 95M tokens. We compared them against general-domain and other domain-specific models for Spanish on three clinical NER tasks. As main results, our models are superior across the NER tasks, rendering them more convenient for clinical NLP applications. Furthermore, our findings indicate that when enough data is available, pre-training from scratch is better than continual pre-training when tested on clinical tasks, raising an exciting research question about which approach is optimal. Our models and fine-tuning scripts are publicly available at HuggingFace and GitHub.This work was funded by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) within the framework of the Plan-TLPeer ReviewedPostprint (published version

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Full text link
    Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License

    TFG 2016/2017

    Get PDF
    Amb aquesta publicació, EINA, Centre universitari de Disseny i Art adscrit a la Universitat Autònoma de Barcelona, dóna a conèixer el recull dels Treballs de Fi de Grau presentats durant el curs 2016-2017. Voldríem que un recull com aquest donés una idea més precisa de la tasca que es realitza a EINA per tal de formar nous dissenyadors amb capacitat de respondre professionalment i intel·lectualment a les necessitats i exigències de la nostra societat. El treball formatiu s’orienta a oferir resultats que responguin tant a paràmetres de rigor acadèmic i capacitat d’anàlisi del context com a l’experimentació i la creació de nous llenguatges, tot fomentant el potencial innovador del disseny.Con esta publicación, EINA, Centro universitario de diseño y arte adscrito a la Universidad Autónoma de Barcelona, da a conocer la recopilación de los Trabajos de Fin de Grado presentados durante el curso 2016-2017. Querríamos que una recopilación como ésta diera una idea más precisa del trabajo que se realiza en EINA para formar nuevos diseñadores con capacidad de responder profesional e intelectualmente a las necesidades y exigencias de nuestra sociedad. El trabajo formativo se orienta a ofrecer resultados que respondan tanto a parámetros de rigor académico y capacidad de análisis, como a la experimentación y la creación de nuevos lenguajes, al tiempo que se fomenta el potencial innovador del diseño.With this publication, EINA, University School of Design and Art, affiliated to the Autonomous University of Barcelona, brings to the public eye the Final Degree Projects presented during the 2016-2017 academic year. Our hope is that this volume might offer a more precise idea of the task performed by EINA in training new designers, able to speak both professionally and intellectually to the needs and demands of our society. The educational task is oriented towards results that might respond to the parameters of academic rigour and the capacity for contextual analysis, as well as to considerations of experimentation and the creation of new languages, all the while reinforcing design’s innovative potential

    Multilingual identification of offensive content in social media

    No full text
    In today’s society there is a large number of social media users that are free to express their opinion on shared platforms. The socio-cultural differences between the people behind those accounts (in terms of ethnicity, gender, sexual orientation, religion, politics, . . . ) give rise to an important percentage of online discussions that make use of offensive language, which often affects in a negative way the psychological well-being of the victims. In order to address the problem, the endless stream of user-generated content engenders a need to find an accurate and scalable solution to detect offensive language using automated methods. This thesis explores different approaches to the offensiveness detection task focusing on five different languages: Arabic, Danish, English, Greek and Turkish. The results obtained using Support Vector Machines (SVM), Convolutional Neural Networks (CNN) and the Bidirectional Encoder Representations from Transformers (BERT) are compared, achieving state-of-the-art results with some of the methods tested. The effect of the embeddings used, the dataset size, the class imbalance percentage and the addition of sentiment features are studied and analysed, as well as the cross-lingual capabilities of pre-trained multilingual models

    Multilingual identification of offensive content in social media

    No full text
    In today’s society there is a large number of social media users that are free to express their opinion on shared platforms. The socio-cultural differences between the people behind those accounts (in terms of ethnicity, gender, sexual orientation, religion, politics, . . . ) give rise to an important percentage of online discussions that make use of offensive language, which often affects in a negative way the psychological well-being of the victims. In order to address the problem, the endless stream of user-generated content engenders a need to find an accurate and scalable solution to detect offensive language using automated methods. This thesis explores different approaches to the offensiveness detection task focusing on five different languages: Arabic, Danish, English, Greek and Turkish. The results obtained using Support Vector Machines (SVM), Convolutional Neural Networks (CNN) and the Bidirectional Encoder Representations from Transformers (BERT) are compared, achieving state-of-the-art results with some of the methods tested. The effect of the embeddings used, the dataset size, the class imbalance percentage and the addition of sentiment features are studied and analysed, as well as the cross-lingual capabilities of pre-trained multilingual models

    Sensor GSM - Detecció passiva d'usuaris de telèfons mòbils

    No full text
    In the recent years a wide range of sensors has been developed in the field of human presence detection for user-centered applications. Some of them are not focused on detecting people themselves but the electronic gadgets that they use to carry, which are basically mobile phones. On this basis, and taking into account that almost everyone owns a cellular phone nowadays, the overall goal of this thesis is to explore a new way to passively detect persons by means of their mobile phone emissions. Modern phones make use of several technologies, most of them valid for the realization of sensors. However, this thesis will be exclusively focused on the Global System for Mobile Communications (GSM) that has been used for cellular telephony from the early nineties to the present day. The work relies on the literature as a starting point for a complete understanding of the intricacies of the second-generation digital cellular networks. A method for detecting nearby mobile phones that are not currently being used is proposed. The method is implemented as a proof of concept using a Software Defined Radio (SDR) peripheral together with the proper open-source software to test the feasibility of the whole idea. While the detection of inactive mobile phones could not be shown using the proposed method, the reliable detection of actively used phones has proven to be possible. As the initial approach could not be implemented successfully, the aforementioned sensor will be limited to the detection of active mobile phone users.In the recent years a wide range of sensors has been developed in the field of human presence detection for user-centered applications. Some of them are not focused on detecting people themselves but the electronic gadgets that they use to carry, which are basically mobile phones. On this basis, and taking into account that almost everyone owns a cellular phone nowadays, the overall goal of this thesis is to explore a new way to passively detect persons by means of their mobile phone emissions. Modern phones make use of several technologies, most of them valid for the realization of sensors. However, this thesis will be exclusively focused on the Global System for Mobile Communications (GSM) that has been used for cellular telephony from the early nineties to the present day. The work relies on the literature as a starting point for a complete understanding of the intricacies of the second-generation digital cellular networks. A method for detecting nearby mobile phones that are not currently being used is proposed. The method is implemented as a proof of concept using a Software Defined Radio (SDR) peripheral together with the proper open-source software to test the feasibility of the whole idea. While the detection of inactive mobile phones could not be shown using the proposed method, the reliable detection of actively used phones has proven to be possible. As the initial approach could not be implemented successfully, the aforementioned sensor will be limited to the detection of active mobile phone users.In the recent years a wide range of sensors has been developed in the field of human presence detection for user-centered applications. Some of them are not focused on detecting people themselves but the electronic gadgets that they use to carry, which are basically mobile phones. On this basis, and taking into account that almost everyone owns a cellular phone nowadays, the overall goal of this thesis is to explore a new way to passively detect persons by means of their mobile phone emissions. Modern phones make use of several technologies, most of them valid for the realization of sensors. However, this thesis will be exclusively focused on the Global System for Mobile Communications (GSM) that has been used for cellular telephony from the early nineties to the present day. The work relies on the literature as a starting point for a complete understanding of the intricacies of the second-generation digital cellular networks. A method for detecting nearby mobile phones that are not currently being used is proposed. The method is implemented as a proof of concept using a Software Defined Radio (SDR) peripheral together with the proper open-source software to test the feasibility of the whole idea. While the detection of inactive mobile phones could not be shown using the proposed method, the reliable detection of actively used phones has proven to be possible. As the initial approach could not be implemented successfully, the aforementioned sensor will be limited to the detection of active mobile phone users

    Multilingual identification of offensive content in social media

    No full text
    In today’s society there is a large number of social media users that are free to express their opinion on shared platforms. The socio-cultural differences between the people behind those accounts (in terms of ethnicity, gender, sexual orientation, religion, politics, . . . ) give rise to an important percentage of online discussions that make use of offensive language, which often affects in a negative way the psychological well-being of the victims. In order to address the problem, the endless stream of user-generated content engenders a need to find an accurate and scalable solution to detect offensive language using automated methods. This thesis explores different approaches to the offensiveness detection task focusing on five different languages: Arabic, Danish, English, Greek and Turkish. The results obtained using Support Vector Machines (SVM), Convolutional Neural Networks (CNN) and the Bidirectional Encoder Representations from Transformers (BERT) are compared, achieving state-of-the-art results with some of the methods tested. The effect of the embeddings used, the dataset size, the class imbalance percentage and the addition of sentiment features are studied and analysed, as well as the cross-lingual capabilities of pre-trained multilingual models

    Sensor GSM - Detecció passiva d'usuaris de telèfons mòbils

    No full text
    In the recent years a wide range of sensors has been developed in the field of human presence detection for user-centered applications. Some of them are not focused on detecting people themselves but the electronic gadgets that they use to carry, which are basically mobile phones. On this basis, and taking into account that almost everyone owns a cellular phone nowadays, the overall goal of this thesis is to explore a new way to passively detect persons by means of their mobile phone emissions. Modern phones make use of several technologies, most of them valid for the realization of sensors. However, this thesis will be exclusively focused on the Global System for Mobile Communications (GSM) that has been used for cellular telephony from the early nineties to the present day. The work relies on the literature as a starting point for a complete understanding of the intricacies of the second-generation digital cellular networks. A method for detecting nearby mobile phones that are not currently being used is proposed. The method is implemented as a proof of concept using a Software Defined Radio (SDR) peripheral together with the proper open-source software to test the feasibility of the whole idea. While the detection of inactive mobile phones could not be shown using the proposed method, the reliable detection of actively used phones has proven to be possible. As the initial approach could not be implemented successfully, the aforementioned sensor will be limited to the detection of active mobile phone users.In the recent years a wide range of sensors has been developed in the field of human presence detection for user-centered applications. Some of them are not focused on detecting people themselves but the electronic gadgets that they use to carry, which are basically mobile phones. On this basis, and taking into account that almost everyone owns a cellular phone nowadays, the overall goal of this thesis is to explore a new way to passively detect persons by means of their mobile phone emissions. Modern phones make use of several technologies, most of them valid for the realization of sensors. However, this thesis will be exclusively focused on the Global System for Mobile Communications (GSM) that has been used for cellular telephony from the early nineties to the present day. The work relies on the literature as a starting point for a complete understanding of the intricacies of the second-generation digital cellular networks. A method for detecting nearby mobile phones that are not currently being used is proposed. The method is implemented as a proof of concept using a Software Defined Radio (SDR) peripheral together with the proper open-source software to test the feasibility of the whole idea. While the detection of inactive mobile phones could not be shown using the proposed method, the reliable detection of actively used phones has proven to be possible. As the initial approach could not be implemented successfully, the aforementioned sensor will be limited to the detection of active mobile phone users.In the recent years a wide range of sensors has been developed in the field of human presence detection for user-centered applications. Some of them are not focused on detecting people themselves but the electronic gadgets that they use to carry, which are basically mobile phones. On this basis, and taking into account that almost everyone owns a cellular phone nowadays, the overall goal of this thesis is to explore a new way to passively detect persons by means of their mobile phone emissions. Modern phones make use of several technologies, most of them valid for the realization of sensors. However, this thesis will be exclusively focused on the Global System for Mobile Communications (GSM) that has been used for cellular telephony from the early nineties to the present day. The work relies on the literature as a starting point for a complete understanding of the intricacies of the second-generation digital cellular networks. A method for detecting nearby mobile phones that are not currently being used is proposed. The method is implemented as a proof of concept using a Software Defined Radio (SDR) peripheral together with the proper open-source software to test the feasibility of the whole idea. While the detection of inactive mobile phones could not be shown using the proposed method, the reliable detection of actively used phones has proven to be possible. As the initial approach could not be implemented successfully, the aforementioned sensor will be limited to the detection of active mobile phone users
    corecore